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Abstract-  This  paper  describes  a  deterministic  algorithm  for  reconfiguring 
a  multibutterfly  network  with  faulty  switches.  Unlike  previous  reconfigura¬ 
tion  algorithms,  the  algorithm  is  performed  entirely  by  the  network,  without 
the  aid  of  any  off-line  computation,  even  though  many  of  the  switches  may 
be  faulty.  The  algorithm  reconfigures  an  JV-input  multibutterfly  network 
in  O(logJV)  time.  After  reconfiguratuion,  the  multibutterfly  can  tolerate 
/  worst-case  faults  and  still  route  any  permutation  between  some  set  of 
N  -  0(f)  inputs  and  N  —  0(f)  outputs  in  O(logiV)  time. 

1  Introduction 

Recently  Leighton  and  Maggs  showed  that  a  multibutterfly  network  can 
sustain  many  faults  and  still  route  packets  efficiently  [6].  In  particular,  they 
showed  that  an  IV-input  multibutterfly  network  can  tolerate  /  worst-case 
faults,  and  stiU  route  any  log  AT  permutations^  from  some  set  of  iV  -  0(f) 
inputs  to  some  set  of  JV  —  0(f)  outputs  in  O(logAr)  time.  In  the  case  of 
random  faults,  the  performance  is  even  better.  Even  if  every  switch  fails  with 
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some  fixed  constant  probability,  an  JV-input  multibutterfly  can  route  logiV 
permutations  between  some  set  of  0(iV)  inputs  and  outputs  in  O(logiV) 
time.  (For  a  description  of  related  results  see  [6].) 

The  Leighton-Maggs  strategy  for  tolerating  faults  consists  of  two  parts. 
First  the  network  is  reconfigured.  Reconfiguring  a  network  consists  of  iden¬ 
tifying  those  parts  that  contain  too  many  faults  to  be  useful  for  routing, 
and  removing  them  from  the  network.  The  goal  is  to  leave  intact  as  much 
of  the  working  hardware  as  possible,  while  maintaining  the  important  struc¬ 
tural  properties  of  the  network.  In  the  case  of  the  multibutterfly,  the  crucial 
property  is  expansion  (defined  in  Section  2).  The  Leighton-Maggs  reconfig¬ 
uration  algorithm  reduces  this  property  somewhat,  but  otherwise  leaves  it 
intact.  As  long  as  the  reconfigured  network  has  some  expansion  property, 
it  is  possible  to  apply  a  routing  algorithm  that  was  designed  to  run  on  a 
fault-free  multibutterfy.  Thus,  the  second  part  of  the  strategy  is  to  apply  an 
off-the-shelf  multibutterfly  routing  algorithm,  such  as  Upfal’s  permutation 
routing  algorithm  [13]. 

One  of  the  drawbacks  of  the  Leighton-Maggs  reconfiguration  algorithm 
is  that  it  is  performed  by  an  off-line  computer  with  knowledge  of  the  state 
of  the  entire  routing  network.  This  paper  presents  an  on-line  algorithm  for 
reconfiguring  the  network  in  O(logiV)  time.  The  algorithm  is  performed 
entirely  by  the  network,  even  though  many  of  its  switches  may  be  faulty. 

The  remainder  of  this  paper  consists  of  two  sections.  In  Section  2  we 
describe  butterfly  and  multibutterfly  networks.  In  Section  3  we  review  the 
reconfiguration  algorithm  of  Leighton  and  Maggs  and  then  describe  the  on¬ 
line  algorithm. 

2  Butterflies  and  Multibutterflies 

An  example  of  an  8-input  butterfly  network  is  illustrated  in  Figure  1.  The 
nodes  in  this  graph  represent  switches,  and  the  edges  represent  wires.  Each 
node  in  the  network  has  a  distinct  label  (r, /),  where  r  is  the  row,  and  I  is 
the  level.  In  a  butterfly  with  N  inputs,  the  row  is  a  log  iV-bit  binary  number 
and  the  level  is  an  integer  between  0  and  logiV.  The  nodes  on  level  0  and 
logiV  are  called  the  inputs  and  outputs,  respectively.  For  /<  logiV,  a  node 
labeled  (r,  1)  is  connected  to  nodes  (r,  1  -t-l)  and  (^^  I  +  1),  where  r*  denotes 
r  with  bit  I  complemented  (bit  0  is  the  most  significant,  bit  logiV  -  1  the 
least). 

In  a  butterfly,  messages  are  typically  sent  from  the  switches  on  level  0, 
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Figure  1:  An  8-input  butterfly  network. 

called  the  inputs,  to  those  on  level  log  N,  called  the  outputs.  In  a  one-to-one 
routing  problem,  each  input  is  the  origin  of  at  most  one  message,  and  each 
output  is  the  destination  of  a  most  one  message.  One-to-one  routing  is  also 
called  permutation  routing. 

2.1  Dilated  butterflies 

Because  message  congestion  is  a  common  occurrence  in  real  networks,  the 
wires  in  butterfly  networks  are  typically  dilated,  so  that  each  wire  is  replaced 
by  a  channel  consisting  of  2  or  more  wires.  In  a  d-dilated  butterfly,  each 
channel  consists  of  d  wires.  Because  it  is  harder  to  congest  a  channel  than  it 
is  to  congest  a  single  wire  in  a  butterfly,  dilated  butterflies  are  better  routing 
networks  than  simple  butterflies  [3,  4,  11,  12]. 

2.2  Splitter  networks 

Butterfly  and  dilated  butterfly  networks  belong  to  a  larger  class  of  networks 
called  splitter  networks.  The  switches  on  each  level  of  a  splitter  network 
can  be  partitioned  into  blocks.  All  of  the  switches  on  level  0  belong  to  the 
same  block.  On  level  1,  there  are  two  blocks,  one  consisting  of  the  switches 
that  are  in  the  upper  iV/2  rows,  and  the  other  consisting  of  the  switches 
that  are  in  the  lower  N/2  rows.  In  general,  the  switches  in  a  block  B  of  size 
M  =  Nf2^  on  level  I  have  neighbors  in  two  blocks,  and  Bi,  on  level  /  -|- 1, 
where  u  stands  for  upper  and  I  for  lower.  The  upper  block,  contains  the 
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Figure  2:  An  8-input  splitter  network  with  multiplicity  2. 


switches  on  level  1  +  1  that  are  in  the  same  rows  as  the  upper  M/2  switches 
of  B.  The  lower  block,  Bi,  consists  of  the  switches  that  are  in  the  same 
rows  as  the  lower  M/2  switches  of  B.  The  edges  from  B  to  13^  are  called 
the  up  edges,  and  those  from  B  to  Bi  are  called  the  doxvn  edges.  The  three 
blocks,  B,  Bu,  and  Bi,  and  the  edges  between  them  are  collectively  called 
a  splitter.  The  switches  in  B  are  called  the  splitter  inputs,  and  those  in  Bu 
and  Bi  are  called  the  splitter  outputs.  In  a  splitter  network  with  multiplicity 
d,  each  splitter  input  is  incident  to  d  outgoing  up  edges  and  d  outgoing 
down  edges,  and  each  splitter  output  is  incident  to  2d  incoming  edges.  In 
a  d-dilated  butterfly,  the  d  up  (and  d  down)  edges  incident  to  each  splitter 
input  all  lead  to  the  same  splitter  output,  but  this  need  not  be  the  case  in 
general.  For  example,  we  have  illustrated  an  8-input  splitter  network  with 
multiplicity  2  in  Figure  2. 

In  a  splitter  network,  each  input  and  output  are  connected  by  a  single 
logical  (up-down)  path  through  the  blocks  of  the  network.  For  example. 
Figure  3  shows  the  logical  path  from  any  input  to  output  Oil.  In  a  butterfly, 
this  logical  path  specifies  a  unique  path  through  the  network,  since  only  one 
up  and  one  down  edge  emanate  from  each  switch.  (In  fact,  a  splitter  network 
with  multiplicity  one  is  very  similar  to  a  delta  network  [5].)  In  a  general 
splitter  network  with  multiplicity  d,  however,  each  switch  will  have  d  up  and 
d  down  edges,  and  each  step  of  the  logical  path  can  be  taken  on  any  one 
of  d  edges.  Hence,  one  logical  path  can  be  realized  by  a  myriad  of  physical 
paths  in  a  general  splitter  network. 


up 


down  down 


Figure  3:  The  logical  path  from  any  input  to  output  Oil. 

2.3  Randomly>wired  splitter  networks  and  multibutterflies 

In  this  paper,  we  are  primarily  concerned  with  randomly-wired  splitter  net¬ 
works.  A  randomly-wired  splitter  network  is  a  splitter  network  where  the 
up  and  down  edges  within  each  splitter  are  chosen  at  random  subject  to  the 
constraint  that  each  splitter  input  is  incident  to  d  up  and  d  down  edges,  and 
each  splitter  output  is  incident  to  2d  incoming  edges. 

The  crucial  property  that  randomly- wired  splitter  networks  are  likely  to 
possess  is  known  as  expansion.  In  particular,  an  M-input  splitter  is  said 
to  have  (a,/3)-expansion  if  every  set  of  fc  <  aM  inputs  is  connected  to  at 
least  pk  up  outputs  and  fdk  down  outputs,  where  o  >  0  and  /?  >  1  are  fixed 
constants.  For  example,  see  Figure  4. 

A  splitter  network  is  said  to  have  {a,  fi)- expansion  if  aU  of  its  splitters 
have  (a,  /3)-expansion.  More  simply,  a  splitter  or  a  splitter  network  is  said 
to  have  expansion  if  it  has  (a,;3)-expansion  for  some  constants  a  >  0  and 
/3  >  1.  A  splitter  network  with  expansion  is  more  commonly  known  as  a 
multibutterfly  [13],  and  a  multibutterfly  with  (a, /3)-€xpansion  and  multiplic¬ 
ity  d  consists  of  splitters  in  which  each  splitter  input  is  incident  to  d  up  and 
d  down  edges  and  for  which  any  k  <  aM  splitter  inputs  are  adjacent  to 
splitter  outputs. 

Splitters  with  expansion  are  known  to  exist  for  any  d  >  Z,  and  they 
can  be  constructed  deterministically  in  polynomial  time  [2,  10,  13],  but 
randomized  wirings  typically  provide  better  expansion.  A  discussion  of  the 
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Figure  4:  An  M-input  splitter  with  (a,y3)-expansion. 


tradeoffs  between  a  and  ^  in  randomly-wired  splitters,  can  be  found  in 
[7, 13].  For  the  purposes  of  this  paper,  two  facts  are  needed.  First,  for  fixed 
d  and  sufficiently  small  a,  the  expansion,  /3,  of  a  randomly-wired  splitter 
will  be  close  to  d  —  1  with  probability  close  to  1.  Second,  for  fixed  a  and 
sufficiently  large  d,  will  be  close  1/2  (the  best  possible)  with  probability 
close  to  1.  It  is  not  known  if  it  is  possible  for  both  /3  to  be  close  to  d  —  1 
and  a/3  to  be  close  to  1/2  simultaneously. 

A  multibutterfly  with  (Q,;S)-expansion  is  good  at  routing  because  one 
must  block  (ik  splitter  outputs  in  order  to  block  k  splitter  inputs.  In  classical 
networks  such  as  the  butterfly,  the  reverse  is  true:  it  is  possible  to  block  2k 
inputs  by  blocking  only  k  outputs.  When  this  effect  is  compounded  over 
several  levels,  the  effect  is  dramatic.  In  a  butterfly,  a  single  fault  can  block 
2^  switches  I  levels  back,  whereas  in  a  multibutterfly,  it  takes  faults  to 
block  a  single  switch  I  levels  back. 

3  Routing  £iround  faults 

In  this  section,  we  present  an  O(logiV)  time  on-line  algorithm  for  reconfig¬ 
uring  a  multibuttery  network  in  the  presence  of  faults.  We  begin  in  Sec¬ 
tion  3.1  by  describing  the  fault  model.  In  Section  3.2  we  review  the  off-line 
algorithm  of  Leighton  and  Maggs.  Next,  in  Section  3.3,  we  describe  the  on- 


line  algorithm.  To  simplify  the  presentation  of  the  algorithm,  we  augment 
the  multibutterfly  with  some  additional  edges.  These  edges  increase  the  size 
and  VLSI  layout  area  of  the  network  by  at  most  a  constant  factor.  As  it 
turns  out,  this  additional  hardware  is  not  really  necessary.  We  conclude  in 
Section  3.4  by  explaining  how  to  implement  the  algorithm  without  using 
these  extra  edges. 

3.1  The  fault  model 

The  reconfiguration  algorithm  and  the  routing  algorithms  in  [6]  tolerate 
static,  non-malicious  faults  in  the  switches.  In  the  static  fault  model,  some 
faulty  switches  may  be  produced  by  the  manufacturing  process  but  once  the 
network  has  been  manufactured,  no  working  switch  ever  fails,  and  no  faulty 
switch  ever  begins  to  work.  We  shall  assume  that  failures  are  non-malicious 
in  the  sense  that  a  working  switch  can  query  any  one  of  its  neighbors  and 
determine  if  that  neighbor  is  faulty  in  constant  time. 

3.2  The  Leighton-Maggs  algorithm 

The  Leighton-Maggs  algorithm  consists  of  two  parts.  The  first  part,  called 
erasure,  removes  some  of  the  outputs  from  the  network.  The  second  part, 
called  fault  propagation,  removes  some  of  the  inputs.  The  goal  of  the  recon¬ 
figuration  algorithm  is  to  leave  intact  a  large  working  subnetwork  in  which 
every  input  can  reach  every  output,  and  in  which  the  splitters  have  (o,  /?)- 
expansion,  where  ^  may  be  less  than  P,  but  must  be  greater  than  one.  The 
proof  that  the  multibutterfly  can  route  log  JV  permutations  in  0(log  JV)  time 
holds  for  any  expansion  /3  greater  than  one  [6, 8, 13].  By  a  similar  argument, 
if  >  1,  the  subnetwork  also  can  route  any  logiV  permutations  between  its 
inputs  and  outputs  in  O(logJV)  time  [6,  8]. 

The  erasure  part  of  the  algorithm  consists  of  removing  those  splitters 
that  contain  too  many  faults.  This  step  requires  some  off-line  computation. 
Each  splitter  in  the  multibutterfly  is  examined^  and  if  more  than  an  e  fraction 
of  its  input  switches  are  faulty,  where  s  =  2a(^— 1)  and  P  =  /3“  [2  J  >  then  the 
splitter  is  “erased”  from  the  network  as  are  all  of  the  switches  and  outputs 
on  level  log  N  that  can  be  reached  from  the  splitter.  In  the  next  section,  we 
will  present  an  on-line  algorithm  for  counting  the  number  of  faults  in  each 
splitter. 

The  second  part  of  the  algorithm,  fault-propagation,  is  executed  by 
the  switches  themselves.  Working  from  level  log  N  backwards,  each  switch 
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checks  if  at  least  half  of  its  upper  output  edges  lead  to  faulty  switches  that 
have  not  been  erased,  or  if  at  least  half  of  its  lower  output  edges  lead  to 
faulty  switches  that  have  not  been  erased,  so,  then  it  declares  itself  to  be 
faiulty  (but  does  not  erase  itself).  Such  a  fault  is  called  a  propagated  fault 

Finally,  all  of  the  remaining  faulty  switches  are  erased.  Since  every 
remaining  input  in  every  splitter  is  linked  to  at  least  working  upper  out¬ 
puts  (if  the  descendant  multibutterfly  outputs  exist)  and  fl)  working  lower 
outputs  (if  the  corresponding  multibutterfly  outputs  exist),  the  network  has 
(a,  ^)-expansion. 

The  following  pair  of  lemmas  bounds  the  number  of  removed  inputs  and 
outputs  in  the  case  of  worst-case  faults  and  random  faults,  respectively. 

Lemma  3.1  ([6])  Suppose  that  there  are  f  faults  in  the  network.  Then  the 
erasure  process  removes  at  most  f  /s  —  0(f)  outputs,  and  there  will  be  at 
most  =  0(f)  propagated  faults  on  any  level. 

Lemma  3.2  ([6])  There  exist  fixed  constants  p  >  0,  and  A  >  0,  such  that 
if  each  switch  fails  independently  with  probability  p,  then  with  probability 
at  least  1  -  the  erasure  and  fault  propagation  processes  leave 

behind  at  least  XN  inputs  and  XN  outputs 

3.3  On-line  reconfiguration 

This  section  presents  an  algorithm  for  determining  which  switches  to  remove 
from  the  network  in  0(log  N)  time.  The  algorithm  is  on-line  in  the  sense 
that  the  computation  is  performed  entirely  by  the  switches,  without  the  aid 
of  any  off-line  computation.  As  in  Section  3.2,  the  reconfiguration  of  the 
network  consists  of  two  parts.  First,  each  splitter  must  determine  if  the 
number  of  faults  in  its  input  block  exceeds  an  e  fraction,  and,  if  so,  then 
it  must  erase  itself.  This  part  is  difficult  because  each  splitter  must  count 
its  own  faults  and  distribute  the  count  to  its  working  switches,  even  though 
the  splitter  itself  may  contain  many  faults.  Second,  faults  are  propagated 
from  the  outputs  of  the  network  towards  the  inputs.  A  switch  is  declared 
faulty  if  more  than  d/2  of  its  upper  or  lower  output  edges  lead  to  switches 
that  are  faulty,  but  not  erased.  After  erasure  and  fault  propagation,  all  of 
the  remaining  faulty  switches  are  erased. 

The  erasure  part  consists  of  two  tasks.  First,  we  must  identify  those 
blocks  that  contain  too  many  faults  and  must  be  erased.  Then  for  each 
splitter,  each  input  switch  must  be  told  if  either  of  the  two  output  blocks  in 
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the  splitter  have  been  erased.  To  help  with  these  tasks,  we  add  some  edges 
to  the  network.  In  particular,  each  switch  is  connected  in  a  random  fashion 
to  d2  other  switches  in  the  same  block.  These  edges  will  be  used  solely  for 
the  purpose  of  counting,  and  not  for  routing.  They  increase  the  VLSI  layout 
area  of  the  network  by  at  most  a  constant  factor.  For  sufficiently  small,  but 
fixed,  02  >  0,  with  probability  close  to  1,  every  set  S  of  k  <  a^M  switches 
in  a  block  of  size  M  will  have  at  least  (32k  neighbors.  We  will  choose  02  to 
be  small  so  that  (32  wiU  be  close  to  ^2  ~  !• 

The  erasure  algorithm  begins  by  repeating  the  following  basic  step  6 
times,  where  6  =  [logiV/logjS^l  +  1>  and  02  =  (32-  L<^2/2J.  Initially,  every 
working  switch  in  the  network  is  awake.  At  each  basic  step,  each  switch  that 
is  awake  examines  its  d2  neighbors  within  the  same  block,  and  falls  asleep  if 
more  than  [^2/2]  of  them  are  faulty  or  asleep.  The  following  lemma  shows 
that  if  there  were  not  too  many  faults  to  begin  with,  then  few  working 
switches  fall  asleep. 

Lemma  3.3  Let  t  denote  the  number  of  faults  in  a  block  of  size  M,  and  let 
s  denote  the  number  of  working  switches  that  fall  asleep.  Ift<  62^1  where 
£2  =  Oi2{02  -  L^2/2J  -  1),  then  s  <  a2M  and  s  <  t/{(32  -  [^2/2]  -  1). 

Proof:  The  proof  is  by  induction  on  the  number  of  basic  steps.  The  base 
case  is  trivial,  since  initially  no  working  switches  are  asleep.  Now  let  Ui  be 
the  set  of  working  switches  that  are  asleep  at  the  end  of  step  i.  Suppose 
that  |i7,l  >  a2M.  Then 


\Ui-i\+t>i(32-\j\)a2M, 

since  the  switches  in  Ui  have  at  least  ^2*^2^  neighbors,  and  each  switch 
in  Ui  has  at  most  [^2/2]  neighbors  that  are  not  asleep  or  faulty.  Since 
jl/i-il  <  a2M  by  induction,  and  t  <  £2M,  we  have  a  contradiction.  Thus 
jC/,!  <  a2M.  As  a  consequence,  the  switches  in  Ui  have  at  least  (32\Ui\ 
neighbors.  Thus, 

Now  suppose  that  |£/,|  >  tf{(32—  [^2/2]  —1).  Since  |C/,-i|  <  t/(^2~  L^2/2j  — 
1)  by  induction,  we  have  a  contradiction.  □ 

The  next  lemma  shows  that  if  any  working  switch  is  awake  after  step  6, 
then  it  is  connected  to  many  nearby  working  switches. 
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Lemma  3.4  If  a  switch  r  in  a  block  of  size  M  is  awake  after  step  6,  then 
at  least  012^2!^  working  switches  can  be  reached  from  r  along  paths  of  length 
at  most  S  that  pass  through  only  working  switches. 

Proof:  If  r  is  stiU  awake  at  the  end  of  step  t,  then  r  must  have  had  at 
least  ^2  =  ^2-  neighbors  that  were  awake  at  the  end  of  step  i  -  1. 

In  turn,  these  neighbors  must  have  had  at  le^t  neighbors  that  were 
awake  at  the  end  of  stepj  -  2,  provided  that  ^2  <  02M.  Otherwise,  these 
switches  had  at  least  a2^2M  neighbors  that  were  awake.  In  general,  r  can 
reach  a  set  of  min{/3V,  a2/j2M}  switches  that  were  awake  at  the  end  of  step 
i  -  j.  Furthermore,  there  is  a  path  of  length  j  from  r  to  any  switch  in 
this  set  that  passes  only  through  working  switches.  Choosing  j  such  that 
min{^2^,  02/^2  ■^}  =  ct2^2^  1 


=  s, 


and  choosing  i  =  S  completes  the  proof. 

Now  the  erasure  algorithm  must  determine,  for  each  splitter,  whether 
its  output  blocks  contain  too  many  faults,  and  it  must  inform  each  input 
switch  if  either  of  the  two  output  blocks  must  be  erased. 

In  order  to  count  the  number  of  faulty  switches  in  the  output  blocks, 
the  switches  in  each  input  block  organize  themselves  into  trees.  Suppose 
that  some  switch  r  in  a  block  of  size  M  remains  awahe  for  S  steps.  We 
call  r  a  ruler.  Each  ruler  attempts  to  form  a  depth-^  breadth-first  spanning 
tree  of  the  working  switches  that  it  can  reach  with  itself  as  the  root.  If  r 
were  the  only  ruler,  then  the  number  of  steps  required  to  form  the  spanning 
tree  would  be  at  most  6.  However,  since  each  ruler  simultaneously  attempts 
to  form  a  spanning  tree,  there  will  be  conflicts  when  the  trees  overlap.  To 
resolve  these  conflicts,  we  will  assume  that  each  switch  in  the  block  possesses 
a  distinct  label.  Each  time  a  switch  is  added  to  a  spanning  tree,  it  is  given 
the  label  of  the  spanning  tree’s  ruler.  If  several  trees  attempt  to  add  the 
same  switch,  then  the  one  with  the  smallest  label  succeeds,  even  if  the  switch 
must  be  removed  from  another  tree.  Since  the  growth  of  the  spanning  tree 
with  the  smallest  label  is  unimp_eded  by  the  other  spanning  trees,  after  6 
steps,  it  will  contain  at  least  cl2^2^  switches. 
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Next,  in  6  steps,  each  ruler  counts  the  number  of  switches  in  its  spanning 
tree.  If  the  total  is  at  least  02^2^ •>  then  it  broadcasts  a  message  to  the 
switches  in  the  tree,  telling  theni  that  they  belong  to  a  large  tree. 

Now  each  large  tree  maJces  an  approximate  count  of  the  number  of 
switches  that  are  awake  in  the  upper  and  lower  output  blocks.  In  order 
to  perform  this  task,  a  third  set  of  edges  is  added  to  the  graph.  For  each 
input  switch,  dz  edges  are  added  to  switches  in  both  the  upper  and  lower 
output  blocks  of  outputs  at  the  next  level.  The  edges  are  inserted  at  random 
so  that  each  set  of  fc  <  azM  switches  in  a  block  of  size  M  has  at  least  fiz^ 
neighbors  in  both  the  upper  and  lower  blocks,  where  az^z  <  1/2.  These 
edges  increase  the  VLSI  layout  area  of  the  network  by  at  most  a  constant 
factor.  We  will  choose  03  =  so  that  a  tree  of  size  azfizM  will  have  at 
least  az^zt^s^  neighbors  in  each  output  block,  and  we  will  choose  choose 
dz  to  be  large  to  that  az^z^z  >  (1  ~  (“2  +£2))/2.  In  S  steps,  each  large  tree 
sums  up  the  number  of  different  switches  in  the  upper  output  block  that  are 
awake  and  have  a  neighbor  in  the  tree.  It  then  does  the  same  for  the  lower 
output  block. 

If  any  large  tree  in  the  input  block  counts  more  than  {l  —  2{ez  +  oc2))M /2 
switches  that  are  awake  in  an  output  block  of  size  M/2  then  it  marks  all  of 
those  switches,  and  the  block  will  not  be  erased.  K  no  switch  in  an  output 
block  is  marked,  then  the  block  will  be  erased.  The  following  lemma  bounds 
the  number  of  network  outputs  that  are  erased. 

Lemma  3.5  Let  f  denote  the  number  of  faults  in  the  entire  network.  Then 
the  total  number  of  erased  network  outputs  is  at  most  f/ez- 

Proof:  If  an  output  block  of  size  Af/2  has  fewer  than  SzMl2  faults,  then  by 
Lemma  3.3,  after  6  steps  it  will  have  at  most  {ez-\-otz)Ml2  faulty  and  asleep 
switches.  Since  the  switches  in  each  large  tree  have  at  least  (1  — (£2+02) Af/2 
neighbors  in  each  output  block,  at  least  (l—2(£2+ 0:2)  Af/2  of  those  neighbors 
must  be  awake.  These  neighbors  will  all  be  marked  and  the  block  will  not 
be  erased.  Thus,  if  an  output  block  is  erased,  then  it  must  have  had  at  least 
an  £2  fraction  of  faulty  switches  to  begin  with.  □ 

After  the  large  trees  have  marked  switches  in  the  output  blocks  that 
are  not  to  be  erased,  the  rest  of  the  input  switches  that  are  awaJce  must  be 
informed.  First,  every  working  input  switch  (awake  or  asleep)  queries  its 
dz  upper  output  neighbors  to  determine  if  they  are  marked.  If  any  of  them 
are,  then  the  input  switch  colors  itself.  Then  the  following  coloring  step  is 
repeated  6  times.  If  any  of  an  input’s  dz  neighbors  in  the  input  block  are 
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colored,  then  the  switch  colors  itseE  (The  same  algorithm  is  then  applied 
to  the  lower  output  block.)  The  following  lemma  shows  that  after  6  steps, 
each  input  switch  will  know  if  the  upper  output  block  has  been  erased. 

Lemma  3.6  If  any  switch  in  an  upper  output  block  is  marked,  then  every 
awake  switch  in  the  input  block  will  be  colored  in  S  coloring  steps,  provided 
that  ol2^2^s  >  1/4" 

Proof:  If  any  switch  in  an  upper  output  block  of  size  M/2  is  marked,  then 
at  least  azM  =  of  them  are  marked,  which  for  >  1/4  is 

more  than  half  of  the  switches  in  the  block.  By  Lemma  3.4,  e_ach  input 
switch  in  a  block  of  size  M  that  is  awake  can  reach  at  least  other 

working  switches  via  paths  of  length  at  most  6,  These  switches  have  at  least 
otzPzPz^  neighbors  in  the  upper  output  block,  which  for  >  1/4  is 

more  than  half  of  the  switches  in  the  block.  If  more  than  half  of  the  switches 
in  the  upper  output  block  are  marked,  and  more  than  half  of  the  switches 
are  neighbors,  then  at  least  one  neighbor  is  marked.  □ 

The  last  step  before  fault  propagation  is  to  declare  any  switch  that  is 
asleep  to  be  faulty.  The  following  lemma  shows  that  the  blocks  that  are  not 
erased  contain  at  most  a  2(£2  +  02)  fraction  of  faulty  and  asleep  switches. 

Lemma  3.7  If  a  block  of  size  M/2  is  not  erased,  then  it  has  at  most  (ez  + 
Oi2)M  faulty  or  asleep  switches. 

Proof:  If  an  output  block  of  size  M/2  has  more  than  (£2  +  a2)M  faulty  or 
asleep  switches,  then  every  large  tree  in  the  corresponding  input  block  has 
at  most  (1  -  2(£2  +  a2))M/2  neighbors  in  the  output  block  that  are  awake, 
and  none  of  those  neighbors  will  be  marked,  □ 

The  algorithm  for  propagating  faults  from  the  outputs  to  the  inputs  in 
O(logiV)  time  is  the  same  as  the  propagation  algorithm  from  the  Leighton- 
Maggs  algorithm.  It  consists  of  logJV  stages,  numbered  1  through  logiV.  At 
stage  i,  each  switch  on  level  log  iV  -  i  counts  the  number  of  faulty  neighbors 
it  has  on  level  i  +  1.  If  more  than  half  of  its  upper  or  lower  outputs  lead 
to  faulty  switches  that  have  not  been  erased,  then  the  switch  declares  itself 
to  be  faulty.  Otherwise  it  does  nothing.  We  will  choose  2(£2  +  02)  <  so 
that  each  unerased  block  has  at  most  an  s  fraction  of  faulty  switches.  As 
a  consequence,  we  can  apply  Lemmas  3.1  and  3.2  to  bound  the  number  of 
faults  that  propagate  to  the  inputs. 
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3.3.1  A  final  look  at  the  constants 

At  this  point,  it  seems  wise  to  verify  that  all  of  the  constraints  on  the 
constants  a,  P,  d,  e,  02,  P2>  ^2,  S2,  013,  P3,  and  ds  can  be  satisfied.  First,  we 
must  choose  a  small  so  that  P  is  close  to  d  —  1.  Second,  we  choose  03  small 
so  that  P2  is  close  to  <^3  -  !•  This  choice  will  make  £3  =  03 (/?3  -  [^3/2]  - 1) 
small.  Third,  we  choose  03  =  Q!3/?3,  and  ds  large  so  that  asPs  is  close  to  1/2. 
In  particular,  we  need  asPs  >  (1  -  (£3  +  oc2))/2  and  asPs  >  1/4.  Finally, 
we  need  2(£3  +  03)  <  £. 

3.4  Removing  the  additional  edges 

The  algorithm  of  Section  3.3  augments  the  multibutterfly  network  with  two 
types  of  edges.  First  d2  edges  are  added  from  each  switch  to  switches  in  the 
same  block.  Then  ds  edges  are  added  from  each  switch  to  both  the  upper 
and  lower  blocks  at  the  next  level.  The  second  type  of  edges  are  easily 
removed.  Their  tasks  can  be  performed  by  the  d  routing  edges  leading  from 
each  switch  to  the  upper  and  lower  blocks  at  the  next  level. 

Removing  the  first  type  of  edges  is  more  problematic.  The  basic  idea 
is  to  simulate  them  using  the  d  routing  edges.  We  begin  by  observing  that 
a  randomly-wired  splitter  is  likely  to  have  expansion  both  from  the  inputs 
to  the  outputs  and  from  the  outputs  to  the  inputs  [7,  13].  Let  (04,  P4)  be 
the  expansion  property  from  the  output  blocks  to  their  input  block.  Then 
04^4  <  1,  and  P4  will  be  close  to  2d  —  1,  provided  that  04  is  sufficiently 
small.  A  set  of  k  <  aM  input  switches  in  a  block  of  size  M  has  at  least  2pk 
output  neighbors  (counting  those  in  both  the  upper  and  lower  blocks).  These 
outputs  in  turn  have  at  least  2P4Pk  input  neighbors,  provided  2Pk  <  a4M. 
Thus,  as  long  as  none  of  the  output  neighbors  are  faulty,  they  can  be  used 
to  simulate  expansion  2P4P  within  the  block.  This  expansion  can  be  used 
in  place  of  P2  in  the  algorithm  of  Section  3.3.  What  if  some  of  these  output 
neighbors  faulty?  This  problem  can  be  solved  by  declaring  a  switch  to  be 
faulty  if  any  of  its  output  neighbors  are  faulty  (without  propagating  any 
faults)  before  the  reconfiguration  process  begins.  This  trick  may  multiply 
the  number  of  fatdts  in  the  network  by  a  factor  of  2d,  but  if  a  switch  survives 
then  all  of  its  output  neighbors  were  initially  working. 
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4  Remairks 

The  techniques  described  in  this  paper  can  also  be  applied  to  other  networks 
whose  underlying  structures  are  trees,  and  whose  blocks  are  connected  by 
expanding  graphs.  One  example  is  a  class  of  networks  called  multi-fat-trees, 
which  are  based  on  the  fat-tree  networks  of  Leiserson  and  Greenberg  [1,9]. 
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